AITopics | image search

Collaborating Authors

image search

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unsupervised Learning of Spoken Language with Visual Context

David Harwath, Antonio Torralba, James Glass

Neural Information Processing SystemsMar-23-2026, 14:48:15 GMT

Humans learn to speak before they can read or write, so why can't computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio captions for the Places image dataset and evaluate our model on an image search and annotation task. We also provide some visualizations which suggest that our model is learning to recognize meaningful words within the caption spectrograms.

caption, machine learning, pattern recognition, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.35)

Add feedback

MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents

Tao, Xijia, Teng, Yihua, Su, Xinxing, Fu, Xinyu, Wu, Jihao, Tao, Chaofan, Liu, Ziru, Bai, Haoli, Liu, Rui, Kong, Lingpeng

arXiv.org Artificial IntelligenceSep-29-2025

Existing multimodal browsing benchmarks often fail to require genuine multimodal reasoning, as many tasks can be solved with text-only heuristics without vision-in-the-loop verification. We introduce MMSearch-Plus, a 311-task benchmark that enforces multimodal understanding by requiring extraction and propagation of fine-grained visual cues through iterative image-text retrieval and cross-validation under retrieval noise. Our curation procedure seeds questions whose answers require extrapolating from spatial cues and temporal traces to out-of-image facts such as events, dates, and venues. Beyond the dataset, we provide a model-agnostic agent framework with standard browsing tools and a set-of-mark (SoM) module, which lets the agent place marks, crop subregions, and launch targeted image/text searches. SoM enables provenance-aware zoom-and-retrieve and improves robustness in multi-step reasoning. We evaluated closed- and open-source MLLMs in this framework. The strongest system achieves an end-to-end accuracy of 36.0%, and integrating SoM produces consistent gains in multiple settings, with improvements up to +3.9 points. From failure analysis, we observe recurring errors in locating relevant webpages and distinguishing between visually similar events. These results underscore the challenges of real-world multimodal search and establish MMSearch-Plus as a rigorous benchmark for advancing agentic MLLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.21475

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.87)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Reviews: Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Neural Information Processing SystemsJan-23-2025, 09:55:46 GMT

The main problem for me is that the paper promises a very real scenario (Figure 1) of how a user can refine search by using a sequence of refined queries. However, majority of the model design and evaluation (except section 4.2) is performed with dense region captions that have almost no sequential nature. While this is partially a strength as no additional labels are required, the method seems suited especially towards such disconnected queries -- there is space for M disconnected queries and only then updates are required. This would provide a deeper understanding of when the proposed method works better. In Figure 1, the user queries seem very natural, but the simulated queries in Figure 1 are not.

natural language query, query, user query, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

The A.I. Memed My Dead Dad. Who Do I Sue?

SlateSep-22-2024, 23:00:00 GMT

Scrolling through X--ugh, I deleted the app, so now I use the browser to look at it on my phone--a post from Farhad Manjoo caught my eye. It's a screen cap of a picture of five elderly men dressed like veterans sitting on a plane. Below the photo it says, "The real heroes are not in Hollywood." If you look a little more closely, it screams janky A.I. Which commercial airliner has five seats in a row next to the window? God knows what army they belong to: There are eagles, and stripes, but no stars.

dead dad, image search, manjoo, (9 more...)

Slate

Country:

North America > United States > Oregon (0.05)
North America > United States > New York (0.05)
Europe > United Kingdom > England (0.05)

Technology:

Information Technology > Communications > Social Media (0.73)
Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

Cazenavette, George, Sud, Avneesh, Leung, Thomas, Usman, Ben

arXiv.org Artificial IntelligenceJun-12-2024

Due to the high potential for abuse of GenAI systems, the task of detecting synthetic images has recently become of great interest to the research community. Unfortunately, existing image-space detectors quickly become obsolete as new high-fidelity text-to-image models are developed at blinding speed. In this work, we propose a new synthetic image detector that uses features obtained by inverting an open-source pre-trained Stable Diffusion model. We show that these inversion features enable our detector to generalize well to unseen generators of high visual fidelity (e.g., DALL-E 3) even when the detector is trained only on lower fidelity fake images generated via Stable Diffusion. This detector achieves new state-of-the-art across multiple training and evaluation setups. Moreover, we introduce a new challenging evaluation protocol that uses reverse image search to mitigate stylistic and thematic biases in the detector evaluation. We show that the resulting evaluation scores align well with detectors' in-the-wild performance, and release these datasets as public benchmarks for future research.

dataset, detector, real image, (15 more...)

arXiv.org Artificial Intelligence

2406.08603

Country:

Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.53)

Add feedback

Unsupervised Learning of Spoken Language with Visual Context

Neural Information Processing SystemsMar-12-2024, 13:45:16 GMT

caption, proceedings, spectrogram, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.35)

Add feedback

On Image Search in Histopathology

Tizhoosh, H. R., Pantanowitz, Liron

arXiv.org Artificial IntelligenceJan-14-2024

Pathology images of histopathology can be acquired from camera-mounted microscopes or whole slide scanners. Utilizing similarity calculations to match patients based on these images holds significant potential in research and clinical contexts. Recent advancements in search technologies allow for nuanced quantification of cellular structures across diverse tissue types, facilitating comparisons and enabling inferences about diagnosis, prognosis, and predictions for new patients when compared against a curated database of diagnosed and treated cases. In this paper, we comprehensively review the latest developments in image search technologies for histopathology, offering a concise overview tailored for computational pathology researchers seeking effective, fast and efficient image search methods in their work.

image retrieval, image search, yottixel, (14 more...)

arXiv.org Artificial Intelligence

2401.08699

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Minnesota > Olmsted County > Rochester (0.04)
Europe > United Kingdom > England (0.04)

Genre:

Research Report (0.82)
Overview (0.68)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Could YOU spot a deepfake? Scientists find humans struggle to detect AI speech even when they've been trained to look out for it

Daily Mail - Science & techAug-2-2023, 18:00:56 GMT

Humans are unable to detect over a quarter of speech samples generated by AI, researchers have warned. Deepfakes are fake videos or audio clips intended to resemble a real person's voice or appearance. There are growing fears this kind of technology could be used by criminals and fraudsters to scam people out of money. Now, scientists have discovered people can only tell the difference between real and deepfake speech 73 per cent of the time. While early deepfake speech may have required thousands of samples of a person's voice to be able to generate original audio, the latest algorithms can recreate a person's voice using just a three-second clip of them speaking.

deepfake speech, speech, video, (13 more...)

Daily Mail - Science & tech

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Deep Fake video of Biden in drag promoting Bud Light goes viral, as experts warn of tech's risks

Daily Mail - Science & techMay-25-2023, 20:14:20 GMT

Deep fake videos of President Joe Biden and Republican frontrunner Donald Trump highlight how the 2024 presidential race could be the first serious test of American democracy's resilience to artificial intelligence. Videos of Biden dressed as trans star Dylan Mulvaney promoting Bud Light and Trump teaching tax evasion inside a quiet Albuquerque nail salon show that not even the nation's most powerful figures are safe from AI identity theft. Experts say that while today it is relatively easy to spot these fakes, it will be impossible in the coming years because technology is advancing at such a fast pace. There have already been glimpses of the real-world harms of AI. Just earlier this week, an AI-crafted image of black smoke billowing out of the Pentagon sent shockwaves through the stock market before media factcheckers could finally correct the record.

disinformation, myer, video, (13 more...)

Daily Mail - Science & tech

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.25)
North America > United States > Virginia (0.05)
Europe > Russia (0.05)
Asia > Russia (0.05)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (0.73)
Information Technology > Artificial Intelligence > Vision (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback

Google adds more context and AI-generated photos to image search

EngadgetMay-10-2023, 17:53:11 GMT

Google is adding some new features to its image search function to make it easier to spot altered content, the company announced at its I/O 2023 keynote Wednesday. Photos shown in search results will soon include an "about this image" option that tells users when the image and ones like it were first indexed by Google. You can also learn where it may have appeared first and see other places where the image has been posted online. That information could help users figure out whether something they're seeing was generated by AI, according to Google. For example, you'll be able to see if the image has been on fact-checking websites that point out whether an image is real or altered.

context and ai-generated photo, google, image search, (3 more...)

Engadget

Country: North America > United States (0.07)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.63)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback